A comparison of methods for speaker-dependent pronunciation tuning for text-to-speech synthesis

نویسندگان

Gabriel Webster

Tina Burrows

Kate Knill

چکیده

Unit-based text-to-speech (TTS) systems typically use a set of speech recordings that have been phonetically transcribed to create a large set of phonetic units. During synthesis, pronunciations for input text are generated and used to guide the selection of a sequence of phonetic units. The style of these system pronunciations must match the style of the phonetic transcriptions of the recorded speech database in order to maximize the quality of the synthesized speech. Furthermore, since different speakers have different speech characteristics, supporting multiple speakers for a single language generally requires applying a speaker-dependent mapping to speaker-independent pronunciations. This paper investigates three automatic methods for this process of speaker-dependent pronunciation tuning: word N-grams, decision trees, and transformation-based learning. Transformation-based learning achieved the best results, lowering the phone error rate of the text pronunciations compared to the speech transcriptions by 26% over the error rate of the unmodified text transcriptions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Processor Training on Speaker Data for Unit Selection Text-to-Speech

This paper describes an approach to synthesizing personalized speech while maintaining not only speaker voice but also speaker pronunciation peculiarities. Personalization is realized by means of pronunciation models trained on speaker data contained in his/her speech database. Untrained models allow to synthesize speech in neutral normative style. On the segmental level, the transcription mode...

متن کامل

Pronunciation lexicon adaptation for TTS voice building

This paper describes reducing phone label errors in TTS voice building by means of modeling of speaker pronunciation variants. Each speaker has his or her own unique pronunciations (and context-dependent variations), so that no one standard lexicon is able to cover all of the speaker’s variations. Creating speaker-dependent pronunciation lexicons for automatic speech labeling of our TTS voice d...

متن کامل

Advantages of Using Computer in Teaching English Pronunciation

Pronunciation continues to grow in importance because of its key roles in speech recognition, speech perception, and speaker identity. Computer is being increasingly used in teaching English pronunciation to enhance its quality. The purpose of this paper is to discuss the advantages of using computer in English pronunciation instruction. Understanding the advantages of computer is an important ...

متن کامل

Techniques for accurate automatic annotation of speech waveforms

We describe techniques used in the development of an automatic annotation system for use with a concatenative text-to-speech synthesis system. The goal of the system is to generate automatically from word-level transcriptions annotations that result in synthetic speech of the same quality as that produced from hand-labelled speech. Our approach in this work has been to use the standard techniqu...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

A comparison of methods for speaker-dependent pronunciation tuning for text-to-speech synthesis

نویسندگان

چکیده

منابع مشابه

Linguistic Processor Training on Speaker Data for Unit Selection Text-to-Speech

Pronunciation lexicon adaptation for TTS voice building

Advantages of Using Computer in Teaching English Pronunciation

Techniques for accurate automatic annotation of speech waveforms

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

عنوان ژورنال:

اشتراک گذاری